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Abstract 

In this paper, we consider clustering based on principal component analysis (PCA) 
for high-dimension, low-sample-size (HDLSS) data. We give theoretical reasons why 
PCA is effective for clustering HDLSS data. First, we derive a geometric represen¬ 
tation of HDLSS data taken from a two-class mixture model. With the help of the 
geometric representation, we give geometric consistency properties of sample principal 
component scores in the HDLSS context. We develop ideas of the geometric represen¬ 
tation and geometric consistency properties to multiclass mixture models. We show 
that PCA can classify HDLSS data under certain conditions in a surprisingly explicit 
way. Finally, we demonstrate the performance of the clustering by using microarray 
data sets. 

Keyvifords: Clustering; Consistency; Geometric representation; HDLSS; Microarray; 
PC score 


1 Introduction 


High-dimension, low-sample-size (HDLSS) data situations occur in many areas of modern 
science such as genetic microarrays, medical imaging, text recognition, finance, chemo- 
metrics, and so on. In recent years, substantial work has been done on HDLSS asymp¬ 
totic theoryj_where_the sam£ln_size_jr_Ja f ixed or n/d ^ 0 as the d ata d imension d —)• 
Hall et al. (2005 ). Ahn et al. (2nn7l ). Yata and Aoshima 12012) and Lv 12013 1 


oo. 


ex¬ 


plored several types of geometric representations of HDLSS data. Jung and Marron f2009l ) 
showed i nconsistency properties of the sample eigenvalues and eigenvectors in the HDLSS 


context. Yata and Aoshima 120121 1 developed the noise-reduction methodology to give 


consistent estimators of both the eigenvalue s and eigenvectors together w ith principal com¬ 
ponent (PC) scores in the HDLSS context. IHellton and Thoresen (20141 ) also gave several 
asymptotic properties of the sample PC scores in the HDLSS conte xt. On the other h and, 
the asymptotic behavior of the sample eigenvalues was studied by Johnstone (20011 ) and 
several literatures in high-dimension, large sample size data situations such as n/d ^ c> 
0 . 

The HDLSS asymptotic theory was created under the assumption either the popula¬ 
tion distribution is Gaussian or the random variables in a sphered data matrix have a 
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/9-mixing dependency. However, Yata and Aoshima f201ol i developed a HDLSS asymp¬ 
totic theory without such assumptions. Moreover, they created a new principal com¬ 
ponent analysis (PCA) called the cross-data-matrix methodology that is applicable to 
constructing an unbiased estimator in HDLSS nonparametric settings. Me anwhile, PCA 
is quite popular for clustering high dimensional data. See Section 9.2 in Ijolliffe 120021 ') 
for details. For cluste r ing HDLSS gene exp ression data, see Armstrong et al. 120021 1 and 


Pomeroy et al. (20021 1. Liu et al. 120081 1 and Ahn et al. (2012 ) gave binary split type clus¬ 
tering methods for HDLSS data. Given this background, we decided to focus on high¬ 
dimensional structures of multiclass mixture models. In this paper, we consider asymptotic 
properties of PC scores for high-dimensional mixture models to apply to cluster analysis 
in HDLSS settings. The main contribution of this paper is that we give theoretical reasons 
why PCA is effective for clustering HDLSS data. 

Suppose there are independent and d-variate populations, Hj, i = 1,...,A:, having an 
unknown mean vector /ij and unknown covariance matrix O) for each i. We do not 
assume Si = ■ • • = S^. The eigen-decomposition of Sj is given by Sj = HiAiHj, where 
Aj = diag(Aji,..., Aid) having eigenvalues An > • • • > Xid > 0 and Hi is an orthogonal 
matrix of the corresponding eigenvectors. We consider a mixture model to classify a data 
set into k (> 2) groups. We assume that any sample is taken with mixing proportions e^s 
from HjS, where £i G (0,1) and X^(Li ei = 1 but the label of the population is missing. We 
assume that e/s are independent of d. We consider a mixture model whose probability 
density function (or probability function) is given by 




( 1 ) 


2=1 


where a; G and 7rj(a;; Sj) is a d-dimensional probability density function (or proba¬ 
bility function) of Hj having a mean vector /^j and covariance matrix Sj. Suppose we have 
a d X n data matrix X = (a^i,..., a;„), where Xj, j = 1,..., n, are independently taken from 
lfT]l. We assume n> k. Let rij = #{j\xj G Hj for j = 1,..., n} and pi = rij/n for i = 1 ,..., k, 
where ffA denotes the number of elements in a set A. We assume that n and njS are in¬ 
dependent of d. Let pL and S be the mean vector and the covariance matrix of ([T]). Then, 
we have that = Yli=i and S = Yl^j=i+i + E*=i 

We note that £^(a;|a; G Hj) = pii and var(a;|a; G Hj) = Sj for i = l,...,k. We denote the 
eigen-decomposition of S by S = HAH^, where A = diag(Ai,..., A^) having eigenvalues 
Al > • • • > A(j > 0 and H = is an orthogonal matrix of the corresponding 

eigenvectors. Let Xj — pu = HA^/'^{zij,...,Zdj)'^ for j = l,...,n. Then, [zij, ...,ZdjY' is 
a sphered data vector from a distribution with the identity covariance matrix. The ith 
true PC score of Xj is given by hf{xj — pi) = X^'^Zij (hereafter called Sij). We note that 
var(sjj) = Aj for all i,j. Let piij = pii — pij and Ajj = ||/.ijj|P for i,j = l,...,k {i < j), 
where || • || denotes the Euclidean norm. Let Amin = inini<j<;j</j Ajj. We note that 
Amin = Ai ^2 when k = 2. Since the sign of an eigenvector is arbitrary, we assume that 
> 0 for i = l,...,k — 1, without loss of generality. In addition, for the largest 
eigenvalue Ajis, we assume the following condition as necessary: 

„ . maxj=i k Xii 

Condition 1. - -0 as d ^ oo. 

^min 
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We consider clusterin g xi,...,Xn into one of IljS in HDLSS situations. When k = 2, 
Yata and Aoshima i201ol i gave the following result: We denote the angle between two 
vectors x and y by Angle(®,y) = cos~^{x"^y/{\\x\\ ■ ||y||)}. Under Condition 1, it holds 
that as d —)• oo 




eU2A 


1,2 


1 and Angle(hi,2 


0 . 


( 2 ) 


1 /2 

Furthermore, for the normalized first PC score (= zij), it follows that 


plim 


Slj 


d—^oo \ 


1/2 


1 



when Xj G Hi, 
when Xj G 112 


(3) 


for j = 1, Here, ‘plim’ denotes the convergence in probability. One would be able to 
classify XjS into two groups if sy is accurately estimated in HDLSS situations. 

In this paper, we consider asymptotic properties of sample PC scores for ([T]) in the 
HDLSS context such as d —)• oo while n is fixed. In Section 2, we first derive a geometric 
representation of HDLSS data taken from the two-class mixture model. With the help 
of the geometric representation, we give geometric consistency properties of sample PC 
scores in the HDLSS context. We show that PCA can classify HDLSS data under certain 
conditions in a surprisingly explicit way. In Section 3, we investigate asymptotic behaviors 
of true PC scores for the k (> 3)-class mixture model and provide geometric consistency 
properties of sample PC scores when k > 3. In Section 4, we demonstrate the performance 
of clustering based on sample PC scores by using microarray data sets. We show that the 
real HDLSS data sets hold the geometric consistency properties. 

2 PC scores for two-class mixture model 


2.1 Preliminary 

The sample covariance matrix is given by = (n — 1)“^(W — X){X — X)"^ = (n — 

“ Xn){xj - XnY', where Xn = n~^YJj=i^i and X = with = 

(1,..., 1)^ G M”’. Then, we define the n x n dual sample covariance matrix hy Sd = {n — 
— X)"^{X — X). We note that rank(S' d) ^ n — 1. Let Ai > • • • > A,i_i > 0 be the 
eigenvalues of Sd- Then, we define the eigen-decomposition of Sof^y Sd = YllZi , 

where Ui = {un, ...,Uin)'^ denotes a unit eigenvector corresponding to Aj. Since the sign of 
UiS is arbitrary, we assume iijZi > 0 for all i without loss of generality, where Zi is dehned 
by Zi = {zii,Zin)"^■ Note that S and Sd share the non-zero eigenvalues. Let Zij = 
for i = 1, ...,n — 1; j = 1, ...,n. We note that % is an estimate of Sijl)\J’^ (= Zij) 

for i = 1, ...,n — 1; j = 1, ...,n from the facts that Zij = {n/(n — {xj — 

and = 1 if Aj > 0, where hi denotes a unit eigenvector of S corresponding to 

Aj. Let Xq = X — and Pn = In — n~^ln1-n, where In denotes the n-square identity 
matrix. We note that Sd = PnX^ Xf)Pn/{n — 1 ). We consider the sphericity condition: 
tr(I]^)/tr (S')^ ^ 0 as d ^ oo. When one can assume th at X is Gaussian oi Z = (zij) is 
p-mixing. 


Ahn et al. (200?! ) and Jung and Marron (20091 ) gave a geometric representation 


3 















as follows: 


plim 


= 1. 


so that 


plim 

d^oo 


(n - 1)Sd 
tr(S) 


= Pr 


(4) 


Remark 1. 

tion and var[ 


Yata and Aoshima f20li) showed that ^ holds under the sphericity condi- 


\xj — /r|p)/tr(S)^ —^0 as d ^ oo. 


From dD) , we observe that the eigenvalue becomes deterministic as the dimension grows 
while the eigenvector oi Sd does not uniquely determine the direction. We note that © 
do es not satisfy the assumption that X is Gaussian or Z is p-mixing. See Section 4.1.1 


m 


Qiao et al. I201oli for details. 


2.2 Geometric representation and consistency property of PC scores 
when k = 2 


We will find a geometric representation for ([T]) and the finding is completely different from 
([1]). We assume the following conditions: 

„ ... max,=i i, 

Condition 2. --?■ 0 as d ^ oo. 


o maxj=i fcrardlaj-pjplselli) 

Condition 3. --)• 0 as d ^ oo. 


A2. 

mm 


Condition 4. 


trCEi) - trC^j) 


0 as d ^ oo for all i,j = 1,..., k (i < j). 


Remark 2. If liiS are Gaussian, it holds that var{\\x — /rj|p|a; G Ilj) = for 

i = 1, ...,k, so that Condition 3 holds under Condition 2. On the other hand, Condition 2 
is stronger than Condition 1 since < tr{Yff) for i = 1, ...,k. 


We define rj = (— 1)*'’'^(1 — rji) according to Xj G Ilj for j = 1, ...,n. The following 
result gives a geometric representation for ([T]) when k = 2. 

Theorem 1. Assume IS. 1 ^ 2 /tr{JY) ^ c (> 0) as d ^ 00 . Under Conditions 2 to f, it holds 


plim = crr'^ + (1 - eie 2 c)P; 

d—>-oo 


(5) 


where r = (ri,..., r^)^. 

From ([5|), the first eigenvector ol Sd uniquely determines the direction. In fact, by 
noting ||r|p = nr/ir/ 2 , we have the following results for the first eigenvector and PC scores 
when k = 2. By using CorollarydJ one can classify XjS into two groups by the sign of 5ijS: 

Corollary 1. Under Conditions 2 to 4, it holds that for Uj > 0, f = 1, 2 


r 

plim hi = —and plimzij 

d—^00 y/IUfUU d—^00 


Vv2/m 

-y/m/m 


when Xj G Hi, 

^ for j = l,...,n. 

when Xj G 112 
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(d) d = 5000 


Figure 1: Toy example to illustrate the geometric representation of zizui on the unit 
sphere when k = 2 and n = 3. We plotted 20 independent pairs of when xi G IIi and 
X2,x^ G 112. The solid line denotes r = (2/3,—1/3,—1/3)^ and the dotted line denotes 
= ( 1 , 1 , 1 )'^. 


We considered an easy example such as Hi : NdifJ-i, Si), i = 1, 2, with = 0, /X 2 = Id, 
El = (0.3l*-^l'^") and S 2 = B(0.3I*-JI'^")S, where B = diag[-{0.5 + l/(d +1)}V2 JO.5 + 
2/{d + 1)}^'^^,(—1)'^{0.5 + d/{d + 1)}^/^]. We note that Ai ^2 = d and Si / S 2 but 
tr(Si) = tr(S 2 ) = d. Then, Conditions 2 to 4 hold. We set ni = 1 and 77-2 = 2. We took 
n = 3 samples as xi G Hi and 2 : 2 , *3 G 112 . In Fig. 1, we displayed scatter plots of 20 
independent pairs of ifii when (a) d = 5, (b) d = 50, (c) d = 500 and (d) d = 5000. We 
denoted r = (2/3, —1/3, —1/3)^ by the solid line and 1„ = (1,1,1)^ by the dotted line. 
We note that ftfln = 0 when Sd ^ O. We observed that all the plots of iHi gather on 
the surface of the orthogonal complement of !„■ Also, the plots appeared close to r as 
d increases. Thus one can classify XjS into two groups by the sign of zijS. If one cannot 
assume Condition 3 or 4, we r ecommend to estimate PC score s by using the cross-data- 
matr ix methodology given by Yata and Aoshima (2ni(il ). See Yata and Aoshima (2f)l(Tl . 
2013l i for the details. 


3 PC scores for multiclass mixture model 

3.1 Asymptotic behaviors of true PC scores when k >3 

We consider PC scores for the k (> 3)-class mixture model. Let e(o) = 0 and 
for 7 = 1,..., k. We assume the condition: 

Condition 5. Angle{fj,i i_^_i, —)• — and —>-0 as d — >■ 00 for i,j = 

\,...,k- 1 (7 < j). 

We note that Afc_i^fc/Aniin —>■ 1 as d ^ 00 under Condition 5. Then, we have the 
following results. 


Theorem 2. Under Conditions 1 and 5, it holds that for 7 = 1, ..., A: — 1; j = 1, ..., n 
0 when i >2 and Xj e UL'ii n m ; 


plim ■ 

d—ioo X 




1/2 


= 


(1 - e(j))/{ei(l - £(*_!))} when xj G Ilj, 


( 6 ) 


- £»)(! - £(7-1))} when Xj G Um=i+i 
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(a) d = 100 


(b) d = 1000 


(c) d = 10000 


Figure 2; Toy example to illustrate the asymptotic behaviors of true PC scores when k = 3. 
We plotted {zij,Z 2 j) which is denoted by small circles when Xj G Hi, by small triangles 
when Xj G 112, and by small squares when xj G IIs. The dashed triangle consists of three 
vertices, (1,0), (—1,2^/^) and (-1,-2^/^), which are theoretical convergent points. 


Remark 3. is equivalent to with k = 2 and i = 1. 

Corollary 2. Under Conditions 1 and 5, it holds that for i = 1,A: — 1 

- - —— -- 1 and Angle{hi, -^0 as d^ oo. 

For example, when k = 3, from ([6]) we have that for j = 1,n 

{ ^/(l — ei)/ei when Xj G Hi, 

— V^ei/Cl — ei) when Xj ^ Hi 
0 when Xj G Hi, 

< ^Jez/{e 2 {l - £i)} when Xj G 112, 

. -\/e2/{e3(l - ei)} when xj G Ila. 

One can check whether xj G Hi or not by the first PC score. If xj ^ Hi, one can check 
whether Xj G 112 or xj G Ha by the second PC score. In general, one can classify xjs by 
using at most the first A: — 1 PC scores. 

We considered a toy example such as Ilj : iVrf(/Xj, 5]j), i = 1,...,4, where /x^ = Id, 
/X2 = (1,..., 1, 0,..., 0)"^ whose first elements are 1, /X3 = (1,..., 1,0,..., 0)^ whose 

first elements are 1, and /X4 = 0. Here, [•] denotes the ceiling function. We set 

S4 = (0.3I*-Jl'^"), S2 = R(0.3l*-JI'^")R, E3 = O.8S1 and S4 = I.2E2, where B is defined 
in Section 2.2. Then, Conditions 1 and 5 hold. We first considered the case when k = 
3 : Hj, z = 1, 2, 3, having (ei, £2, £3) = (1/2,1/4,1/4). We set n = 20 and (ni, 7x2,7x3) = 
(10,5,5). From Theorem [2] one can expect that {zij,Z 2 j) (= (sy/A^'^^, S2j/A2^^)) becomes 
close to (1,0) when Xj G Hi, (—1,2^/^) when Xj G n2, and (—1, —2^/^) when Xj G Ha. In 
Fig. 2, we displayed scatter plots of (zij, Z 2 j), j = 1,..., n, when (a) d = 100, (b) d = 1000 
and (c) d = 10000. We observed that the scatter plots appear close to those three vertices 
as d increases. 

Next, we considered the case when A = 4 : Hj, x = 1,..., 4, having £1 = • • • = £4 = 1/4. 
We set 7x = 20 and txi = • • • = 7x4 = 5. In Fig. 3, we displayed scatter plots of {zij, Z 2 j,Z 3 j), 
j = l,...,7x, when (a) d = 100, (b) d = 1000 and (c) d = 10000. From Theorem O we 
displayed the triangular pyramid given by ([6]) with A = 4. As expected theoretically, we 


plim 

d—>-oo A4 


1/2 


and plim 


S2j 


d—>00 A. 


1/2 
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Figure 3: Toy example to illustrate the asymptotic behaviors of true PC scores when 
k = A. We plotted {zij^ Z 2 j, z^j). The dashed triangular pyramid was given by ([6]) with 
k = A. 


observed that the scatter plots appear close to four vertices of the triangular pyramid as 
d increases. They seemed to converge slower in Fig. 3 than in Fig. 2. This is probably 
because the conditions of Theorem [2] become strict as k increases. 

3.2 Consistency property of PC scores when k > 3 

Let 7?(o) = 0 Vii) = dj for i = 1, ..., k. We assume the condition: 

Condition o. - m -)■ U as d^oo. 

A2 . 

mm 

As for the estimated PC scores, we have the following result. From Theorem [3l one 
can classify XjS into k groups by the elements oi Ui, i = 1,..., /c — 1: 

Theorem 3. Under Conditions 2 to 6, it holds that for Ui > 0, i = 1, ...,k 


plimzjo = < 


d^oo 


when i >2 and Xj e UL'ii n ri 
when Xj E Ilj, 


^(1 - - V(i-i))} 


when Xj E Um=i+i 

fori = l,...,k- 1; j = 


(7) 


4 Real data examples 

4.1 Clustering when k = 2 


We analyzed gene expression data bv IChiaretti et al. 120041 1 in which the data set consists 
of 12625 (= d) genes and 128 samples. The data set has two t umor cellular subtyp es, 
Hi : B-cell (95 samples) and 112 : T-cell (33 samples). Refer to Jeffery et al. l2006h as 
well. We considered three cases: (a) n = 10 samples consist of the hrst 5 samples both 
from Hi and 112 (be. ni = 5 and n 2 = 5); (b) n = 40 samples consist of the hrst 20 
samples both from Hi and 112 (he. ni = 20 and n 2 = 20); and (c) n = 128 samples 
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consist of TT-i = 95 samples from Hi and n 2 = 33 samples from 112. In the top panels 
of Fig. 4, we displayed scatter plots of the first two PC scores, (zij,Z 2 j)s, for (a), (b) 
and (c). From Corollary [H we denoted and — by dotted lines. For 

(a), we observed that the estimated PC scores give good performances. The first PC 
scores gathered around (f/ 2 /??i)^'^^ or — (7/i/f/2)^'^^- For (b), the estimated PC scores gave 
adequate performances except for the two points from 112. Those two samples, which are 
the ninth and twentieth samples of 112, are probably outliers. In fact, the two points are 
far from the cluster of 112. The other 38 samples were perfectly classified into the two 
groups by the sign of the first PC scores. As for (c), although there seemed to be two 
clusters except for the two samples, we could not classify the data set by the sign of the 
first PC scores. This is probably because rji and r ]2 are unbalanced and n is large. From 
when the mixing proportions are unbalanced, Ai becomes small. The first eigenspace 
was possibly affected by the other eigenspaces so that the first PC scores appear in the 
wrong direction. We tested the clustering except for the outlying two samples. We used 
the remaining 31 samples for 112. We considered three cases for samples from IIi: (d) 

the first 16 samples from Hi, so that ni = 16, n 2 = 31, n = 47 and iji/Tj 2 ~ 0.5; (e) 

the first 31 samples from Hi, so that ni = 31, n 2 = 31, n = 62 and r/i/r /2 = 1; and (f) 

the first 62 samples from Hi, so that ni = 62, n 2 = 31, n = 93 and rji/ri 2 = 2. In the 

bottom panels of Fig. 4, we displayed scatter plots of {zij,Z 2 j)s for (d), (e) and (f). For 
(d) and (e), we observed that the estimated PC scores give good performances. As for (f), 
although there seemed to be two clusters, we could not classify the data set by the sign 
of the first PC scores. t/i and 772 are unbalanced in (d) and (f). Even though (d) is an 
unbalanced case, the estimated PC scores worked well for the case. We had an estimate 
of the ra tio of the first eigenvalues . A 11 /A 21 , as 1.598 by the noise-reduction methodology 
given bv lYata and Aoshima 120121 ). The first eigenspace of 51 in (d) is less affected by the 
first eigenspace of E)jS than in (f) since S = £le 2 ^ll^ 2^^'^2 + £ 1^1 +£ 2 ^ 2 . This is probably 
the reason why the estimated PC scores gave good performances even in (d). 


4.2 Clustering when /c > 3 


We analyzed gene expression data bv iPomerov et al. (20021 ) in which the data set consists 
of five brain tumor types. However, we only used 4 classes given in the CRAN R package 
‘rda’ in which the data set consists of 5597 (= d) genes and 34 samples. We set the four 
tumor types as Hi : medulloblastomas (10 samples), n 2 : malignant gliomas (10 samples), 
Hs : normal cerebellums (4 samples) and n 4 : AT/RT (10 samples). We first considered 
the case when k = 3 : Hj, i = 1, 2,3, so that ni = 10, n 2 = 10, 773 = 4 and n = 24. In the 
left panel of Fig. 5, we displayed scatter plots of the first two PC scores, (zij,Z 2 j)s. From 
Theorem [3l we displayed the triangle given by ([7]) with k = 3. Although there seemed 
to be three clusters, we could not observe that they gather around each vertex. This is 
probably because the rate of convergence is slow because of small d compared to such 
large n when k > 3. We tested the clustering with a small sample size: the first 5 samples 
both from Hi and n 2 and the last 2 samples from Hs, so that ni = 5, 77,2 = 5, 77,3 = 2 
and 77 = 12. We displayed the results in the right panel of Fig. 5. They seemed to be 
classified into three classes around each vertex. 

Next, we considered the case when = 4 : Hj, i = 1,...,4, so that ni = 10, 772 = 
10, 773 = 4, 774 = 10 and n = 34. In Fig. 6, we displayed scatter plots of the first three PC 
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Figure 4: We displayed scatter plot s of the first two PC scores, supposing A; = 2 in 
the data set of Chiaretti et al. 120041 ') . We denoted them by small circles when Xj G Hi 
and by small triangles when Xj G 112. The theoretical convergent points, (r? 2 /'^i)^'^^ and 
-(r?l/r?2)^/^ are denoted by dotted lines. The two samples, encircled by dots in (b) and 
(c), are probably outliers. 


scores. Although there seemed to be four clusters of each Ilj, the data set seemed not to 
hold the consistency property given by ([7]) in Theorem [3l This is probably because some 
of Conditions 2 to 6 in Theorem [3] are not met because of such large k. 


4.3 Clustering: Special case 


We analyzed gene expression data bv lArmstrong et al. (20021 ') in which the data set consists 
of three leukemia subtypes having 12582 (= d) genes. We used 2 classes such as Hi: acute 
lymphoblastic leukemia (24 samples) and 112; mixed-lineage leukemia (20 samples), so 
that ni = 24, 77-2 = 20 and n = 44. In Fig. 7, we displayed scatter plots of the first three 
PC scores. We observed that the data set is perfectly separated by the sign of the second 
PC scores. This figure looks completely different from Fig. 4. This is probably because 
the largest eigenvalue. An or A 21 , is too large. When k = 2, we give the following result to 
explain the reason of the phenomenon in Fig. 7. Under the assumptions of Proposition [H 
one can classify xjs into two groups by some i-th PC score even when Condition 1 is not 
met: 


Proposition 1. Assume maxj=i^2 2/^1 2 —^ 0 os d —)• 00 . Then, there exists 

some positive integer such that 


-^-1 as d ^ 00 . 

eie2Ai,2 

Furthermore, assume that Aj^ is distinct in the sense that liminfd_>,oo |Aj//Ai^ — 1| >0 for 
i' = l,...,d [i' U). Then, if ^ 0; holds that Angle{hi^, pi 12 ) —>• 0 os d —>• 00 
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(i) (ni,n2,n3) = (10,10,4) 


(ii) (ni,n2,n3) = (5,5,2) 


Figur e 5: We displayed scat ter plots of the first two PC scores, supposing A: = 3 in the data 
set of Pomeroy et al. (20021 ). We denoted them by small circles when Xj G Hi, by small 
triangles when Xj G 112 and by small squares when Xj G IIs. The theoretical convergent 
points are denoted by the vertices of the triangle. 



Figure 6: We displayed scatter p lots of the first three PC scores, supposing A: = 4 in the 


data set of lPomerov et al. 120021 ') 


i 


and for j = 1,..., n 


phm^ 
d^oo X. 



when Xj G Hi, 
when Xj G 112. 


We estimated the largest eigenvalue by using the noise-reduction methodology given 


Yata and Aoshima l2012 i. We estimated Ai ^2 by using an unbiased estimator given 
Aoshima and Yata l2014l i. Then, we obtained the estimates of (Aii/Ai^ 2 , A 2 i/Ai^ 2 ) 


by 
by 

as (0.465,0.787), so that Condition 1 is not met obviously. In addition, by estimating 
EiS by rjiS, we had e 2 A 2 i > eie 2 Ai^ 2 - Thus, the first eigenspace of S is probably the 
first eigenspace of since S = + £ 2 ^ 2 . We conclude that i* in 

Proposition [1] must be 2. This is the reason why the data set can be separated by the sign 
of the second PC scores in Fig 7. 


5 Concluding remarks 

In this paper, we considered the mixture model by © in the HDLSS context such as 
d —)• 00 while n is fixed. We studied asymptotic properties both of the true PC scores 
and the sample PC scores for the mixture model. We gave theoretical reasons why PCA 
is effective for clustering HDLSS data and we showed that HDLSS data can be classified 
by the sign of the first several PC scores theoretically. However, we have to say, in actual 
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Figure 7: We displayed scatter plo ts of the first three PC scores, supposing A: = 2 in the 
data set of [Armstrong et ah f 20021 ') . 


HDLSS data analyses, one may encounter cases snch as in Figs. 4(c) and 7 where the 
data set is not always classified by the sign of the first several PC scores. Several reasons 
shonld be considered: (i) Actual HDLSS data sets often include several outliers; (ii) The 
regularity conditions are not met; and (iii) d is not snfficiently large. Thns, we recommend 
the following three steps: (I) Apply PCA to HDLSS data; (II) By using PC scores, map 
the data set onto a feature space such as the first three eigenspaces; and (HI) Apply 
general clnstering methods such as the /c-means method to the feature space. 

We are now investigating the theory fnrther and hope to bring it closer to the results 
of actual analysis. 
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A Appendix 


Thronghont, let Ui = {un, ...,Uin)'^, where 


0 


Uij — < 


- ?/(i))(l - Vii-i))} 


when i > 2 and Xj e U^-=i n m 1 
when Xj G Hj, 
when Xj € Um=i+i 


for i = l,...,/c - 1; j = l,...,n. Let Ui = Y1 ’L=i - Rm) for * = Let 

V = (i/'(i),..., i/(„)), where = Ui according to Xj G H* for j = l,...,n. Note that 
V\n = YlJ=i^U) ~ define the eigen-decomposition of V^Vjn by V^Vjn = 

XiUiuJ from the fact that rank(I4) < A; — 1, where Ai > • • • > Afc_i > 0 are 
eigenvalues of V'^Vjn and Ui = {un, ...,Uin)'^ is a unit eigenvector corresponding to Aj 
for each i. We assume ujui > 0 for i = l,...,A: — 1, without loss of generality. 
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A.l Lemmas and their proofs 

Lemma A.l. When k = 2, it holds that under Conditions 2 to 4 


plim 

d—yoo 


(n 


1)Sd - tr{T,i)Pn 
Ai,2 


rr 


T 


Proof. Let = r]i^i + 772/^2- Then, we can write that Xj — /x^ = {xj — /Xj ) + (-!)«(!- 
r]i)ni2 j — i = 1,2. From the fact that Aji < tr(S?)^/^, we have that 

var{(£Cj — /Xj)^/X;^ 2l®i ^ hj} = /x^2^*Mi,2 ^ Ai^2Aii = o(A| 2) as d —>■ 00 for j = 
7 = 1,2 under Condition 2. Also, we have that var{(a;j — fi^)'^{xji — /Xj/)|a;j G 
Iii.,Xjt G Ilj/} = tr(l]jl]j/) < tr(5]?)^/^tr(S^/)^/^ = o(A^2) 3 7^ f and i,i' = 1,2 

under Condition 2. Then, by using Chebyshev’s inequality, for any r > 0, under Condition 
2, it holds that for all j 7^ j' and i^i' = 1,2 


P{\{xj — nff'{xj! — /Xj/)/Ai^2| > T\xj G G n,/} = 0(1) and 

P{\{xj - /xJVi,2/^i,2| > T\xj G nj = 0(1), (8) 

so that {xj — fi^)'^{xji —/Xj/)/Ai^2 = op(l) and {xj —/xJ^/X;^ 2/^1,2 = op(l) when Xj G Ilj 
and Xj/ G Ilj/ (j 7^ j'). We note that i?(||a;j — fi^W^lxj G Ilj) = tr(5]j). Similar to ([8]), 
under Condition 3, it holds that {\\xj — /Xj|p — tr(Sj)}/Ai^2 = op(l) when Xj G Ilj for 
j = 1,..., n; 7 = 1,2. By noting that {tr(5]i) — tr(I]2)}/Ai^2 = o(l) under Condition 4, we 
have that 

,. (A -/x^l^)'^(X -/X 1^) - tr(S:i)/„ ^ 

d^oo Ai ^2 

under Conditions 2 to 4. By noting that Pn{X — /Xj^l^)^(A — fi.^l'^)Pn/{n — 1 ) = Sd 
and r'^Pn = r'^ from r^ln = 0, we conclude the result. □ 

1 /2 

Lemma A.2. Let /Xj j+i = /Xj^j+^/A.l^^^ for i = 1,..., k-l, and let A(jj) = Ajj+i/Aj,j+i 
for i,j = 1,..., k — 1 {i < j). Under Conditions 1 and 5, it holds that as d ^ 00 

. ^-!wl + o(l) and hf/Xj_j+i = 1 + o(l) for i = 1, ...,k - 1; 

hj(ii_i i = —-- ^Tl + 0(1)} for 7 = 2,..., k — 1 when k > 3; and 

1 —^ 

hJUij+i = o(^(t(j)) for i,j = 1, k-l (7 + 1 < j) when k>3. 

Proof. Let (g K'^) be an arbitrary unit vector. Since S = Yli=i Yf/^=i+i^UjUi,jUlj 
+ YfA=i if holds that as d —>■ 00 


eJSe, 

X— =-X-— + 0(1) 

^k—l,k ^k—l,k 


(9) 
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under Condition 1. Note that hJ = 1) •■•) k {i < j). Thus it holds 

that 


k—1 k 

2=1 j=i-\-l 

k—1 k—2 k—1 

i=l 1=1 j=i-\-l 


( 10 ) 


From the fact that Ai = hf'Shi = maxej(ej5]erf), by combining ([9]) with (fTOj) . under 
Conditions 1 and 5, we have that 

^ =niax{e(i)(l -e(i))(eJ/ii_ 2 )^ + o(l)} = e(i)(1 - e(i)) + o(l). 

^1,2 

Hence, from the assumption that hf /Xj^ 2 — 0) it holds that /x^ 2 = 1 + 0(1). 

Next, we consider A 2 and /i 2 - Note that /x^j_|_i/Xjj^;^ = o(l) and ^(ij) = o(l) for 
i,j = 1,..., A: — 1 (i < j) under Condition 5. Then, under Conditions 1 and 5, it holds that 
for j >2 

0 = —^^ =e(i)(l — e(i)){l + o(l)}/xf^2^i + + *^(^( 1 , 2 )) 

1,2 

from ([ ^ - (fTOl) and Ai^/X 2 3 = o(l), so that for j >2 

hj 1 ^ 1,2 = -{(1 - e(2))/(l - £(l))}/i2^3^j^f(2) + o(^{l,2))- (11) 

By combining Q with (jlOll and (jlip . we have that 


A 2 

^ 2,3 


hl'Eh2 

^ 2,3 

,Trv^2 


^2 {Z)i=l ^(i)(l ~ + ^(1)(1 “ ^(2))(/^l,2/^2,3 + A''2,3/^1,2)1^2 


A. 


2,3 


rr O 2h2)‘^ 

e(2)(l - ^(2))(/i2,3l*'2) + ^(1)(1 “ £(1))— IT -1“ 2e(l)(l - £(2)) 


+ 0 ( 1 ) 

(/X^2^2)(/ii^3^2) 


hi,2) 


A 


1/2 

( 1 , 2 ) 


+ o(l) 

(1 \ e(i)(l - ^(2))^ ni_^2(l-£(2)) , . 

£( 2 )(i- — £( 2 ))-;--1" o(l) — — -^ + o(l) 


1 - £(1) 


(1 -£( 1 )) 


( 12 ) 


under Conditions 1 and 5. Hence, from the assumption that hl ^2 3 — 0, it holds that 
^2^/12,3 = l + o(l)- 
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Next, we consider A 3 and h^. Note that = ^(1) ^ 3 from 1 ^ 2,3 = l+o(l). 

Then, under Conditions 1 and 5, we have that for j >3 


0 = 


hi S/i, 




A 


1,2 


+ ^(i)(l - ^( 3 ))A 3 , 4 ^i^J( 3 ) + o(A[{ 3 )) and 


,V2 


(13) 


0 = 


hl^Eh 


'] 


A2,3 


—S(l)(l ~ £(1)) 


^2 hl,2hl,2^j 

A(i,2) 

T/. /.T 


+ ^(1)(1 “ ^(2)) 


^2 ihl,2h2,3 + h2,3hl,2)^j 


A 


1/2 

( 1 , 2 ) 


+ £(1)(1 - g(3)) ^h2/^3,4fcj ^1/2^ ^ ^ o(l)}/i^3/rj 


aV2 

^( 1 , 2 ) 

■.T j, . A 1/2 


+ g(2)(l - ^i3))h3A^j^{2,3) + °(^{2,3)) 

£ 2(1 -g(2)). /iM 'T . , ^ 2(1 - £(3)) . a1/2 , /a1/2 \ 

-{1 + o(l)}/i2,3^j H ^-- h3,4:^j^(2.3) + '2(Ac2.3')) 


1 -£( 1 ) 
■.T 


1 - 


£(i) 


h2,3) 


(2,3)1 


+ X o(A(3^2/^) 


(14) 


from (l9])- (fTn) . /j.f/i 2,3 = o(l)) ^Tm 3,4 = o(l) and h^h 3 ^A = o(l)- Then, by combining (fT^ 
and (I14j) . under Conditions 1 and 5, it holds that for j >3 


hjhi,2 = o(Af(3)) and hjil2,3 = -{(1 - £(3))/(l “ £( 2 ))}Ai 4 ^iAj/ 3 ^ + 

’ (15) 

Similar to (fT^ . by combining ([9]) with (fTOj) and (fTSll . under Conditions 1 and 5, we have 
that 


1/2 


1/2 


A 3 


— £(3)(1 “ ^{3)){U^,J^3f + £(2)(1 - £( 2 )) 

“(2,3) 

+ 0(1) 

_ n ^ £(2)(1 “ £(3))^ “ ^(3)) , . 

— £(3)(1 “ £( 3 )) ;--1" 0 ( 1 ) — — -^-h o(l). 


{^2,3^3? {(ll^hz){(ll,A^3) 

——+26(2) (1-6(3))- 


A 


1/2 

(2,3) 


1 - £( 2 ) 


(1 -£( 2 )) 


SO that = 1 + 0(1) from the assumption that ^3/^3 4 > 0 . 

In a way similar to A 3 and h^, as for Aj and hi (4 < i < k — 1), we have that 
Ai/Ai 4 +i = £*(1 - £(i))/(l - £(i-i)) + 0 ( 1 ), hfili^i^-i = 1 + 0 ( 1 ) and hjfii_^^i = -{(1 - 
£(i))/(l-£(i-i))}Aj/^i j){l + o(l)} together with= o(Aj/^p for/,j = 1,...,A:-1 
(i + 1 < j) under Conditions 1 and 5. It concludes the results. □ 

Lemma A.3. Under Conditions 1 and 5, it holds that for i = 1 ,..., k — 1 

0 when i >2 and i' < i, 


lim hf 


^m{hi' Urn) 


d^oo 


m=l 


X 


1/2 


= 


- £(i))/{£i(l - £(i-i))} wheni' = i, 
-Y^£i /{(1 - £(i))(l - £(i-i))} wheni'>i. 


14 


















Proof. We write that 


k k—1 

m=l m=l 


k 

~ P'm) ~ 

m=l 


fc fc—1 i—1 

and ^ ^ P'm) ^ ^^ ^ ^{m)P'm,m+l 

m=l m=i m=l 


k-1 




m+1 


m=l 


for i = 2,..., k — 1. 

(16) 


By using Lemma lA.21 under Conditions 1 and 5, we have that as d ^ oo 

J.T -/^m) _ , n^_1 ^ , ('1^ O 

hi 2 _^ - —Yp^ -— hi - —p -h o(l) — 1 — 6 ( 1 ) + o(l) and 


^ A 1/2 

m=l ^1,2 


A '■ 

^1,2 


hi E + 0(1) = -^(.) + 0(1) f”»' = 2. * 


A 1/2 

m=l ^1,1+1 


A 


1,2 


from (HID. Also, by using Lemma IA.21 under Conditions 1 and 5, we have that for 
z = 2,...,/c — 1; z'= z + 1,..., A:; z" = l,...,z —1 


uT ^rnihi — hm) _ 

. 1/2 ~ . 1/9 +0111 


2=1 


2,^+1 


A1/2 


= (1 - 6(i)) + — -^- + o(l) = ^-- + o(l), 


jT V'' ^rnihi' hm) _ 

2^ .1/2 “ ^(d 

m=l 
k 

T V'' ^rnihi" ~ hm) 


- £(2-1) 
£(i—1) (1 £( 2 )) 


1 “ £( 2 - 1 ) 


1 “ ^( 2 - 1 ) 


+ o(l) = - 


1 £( 2 — 1 ) 


+ 0 ( 1 ) 


and hf ^ 


A 


1/2 


= 0 ( 1 ). 


m=l 

Thus, from Lemma lA.21 we can conclude the results. 


□ 


Lemma A.4. Assume Conditions 2 to 6. Then, under the condition: 

plim —A — = Ci G (0, 00 ) for z = 1,..., k — 1, (17) 

d—)-oo 

it holds that 

plimuf Ui = 1 for ufui >0, z = 1,..., k — 1. 
d^oo 

Proof. We have that var{/x++i(aij - G n*/} = /x+^iEiz/Xi^^+i = o{Al_i i,) as 

d ^ 00 for j = 1, ...,n; z = 1, ...,k — 2; i' = 1, ...,k, under Condition 6. Also, from the 
fact that Aji < tr(Sf)^/^, we have that var{/x^_j^ i^{xj — fMi)\xj G Ilj} = fx[_i < 

XiiAk-i^k = o{A‘l_i j^) for j = l,...,n; z = l,...,k, under Condition 2. Then, similar 
to dS]), under Conditions 2 and 6, it holds that fj,f^^i{xj — Ak-i^k = op(l) when 
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Xj G Yii' for j = i = — 1; i' = In addition, under Conditions 2 

and 3, we can claim that {xj — fi^)'^{xji — = op(l) and \\xj — /Xj|p/Afc_i^fc = 

tr(Sj)/Afc_i^fc + op(l) when Xj G Ilj and Xji G Ilj/ for all j / / and i' = 1, k. Here, 
we write that Xj — = {xj — /xJ + ivj for j = 1,..., n; i = 1,/c, where /x^ = 

Then, by noting (fT6]l with £i = r]i and e(j) = i = under Conditions 2, 3 and 

6, we have that 


\Xj — /X, 


77 1 




+ tr(5]j) 




A 


k—l,k 


+ op(l) and 


- /X ) 


T 

uf Vi, 


A 


fc —1,/c 


A 


k—l^k 


+ op(l) 


when ajj G H* and Xj, G Hj/ for all j 7 ^ j' and i, i' = 1, k. Thus, under Conditions 2, 3, 
4 and 6, it holds that 


plim- - — 

d—>-oo 


T\T, 


X-/x^O-tr(Si)I„-y^F 


= O. 


(18) 


Let (g be an arbitrary random unit vector such that = 0. We note that 

Pn{X - fj. - n l^)Pn/{n-l) = Sd- Then, by noting under ([IT]), 


Conditions 2, 3, 4 and 6, we have that 


(n - 1)Sd - tr(Si)J^ 

^k-l,k 


€t?,* — ^ 


T 


{X - fiXlfiX - /xA I) - tr(Si)P 


Ax- 


k—l,k 


= e 


v^v 

n—1 


+ op(l) = + op(I) 




= e 


T EILi {(^ - 1)^* - tr(S:i)}xxi-ai 


n* 


Afc_i,A 


(19) 


from (I18h . We note that uj In = 0 for i = 1, /c — 1 in case of rank(V) = k — 1. Also, we 
note that Aj, i = l,...,A: — 1, are distinct under Condition 5 and (fTTl) for a sufficiently large 
d. Thus, if tijui > 0 for i = 1,..., fe — 1, we have that tijui = 1 + op(l) for i = 1, k — 1. 
It concludes the result. □ 


Lemma A. 5. Assume Condition 5. For n* > 0, i = 1, ■■■, k, it holds that for i = 1,k — 1 
plim—A— = — -plim iff Uj = 1. 

d—joo Aj j_|_i 1 d —^00 

Proof. By noting (fT6]l with Si = rji and e(j) = i = 1 ,..., k, we can write that 


VV^ 

n 


k-l 

^ ^ T(i) (1 T(i) )/^i,i+l/^i,i+l 
i=l 

k-2 k-l 

+ EE 

i=lj=i+l 


( 20 ) 


We have the eigen-decomposition of /n by VV'^/n = Yli=i , where hi is 

a unit eigenvector corresponding to Aj for each i. We note that rji > 0, i = l,...,k for 
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Hi > 0, i = Then, by noting Lemmas IA.2IIA.3] and the fact that ()20p is same as 

(fTop with i = 1,k — 1, under Condition 5, we have that for i = 1, k — 1 

= and plim 

d—>00 t d—>oo 

if hj > 0. We note that Uij = hj from the fact that iii = hi/{nXi)^^"^ 
for i = 1,k — 1. Hence, we can conclude the result. □ 

A.2 Proofs of the theorems, corollaries and proposition 
A.2.1 Proofs of Theorem 1 and Corollary 1 

We note that tr(Si)/tr(S) —>• (1 — eie 2 c) as d —)• oo under Condition 4 and Ai^ 2 /ti'(S) 
c (> 0) as d ^ oo. Then, by using Lemma I A. 11 we can conclude the result of Theorem 1. 

Next, we consider the proof of Corollary 1. From the fact that l^Soin = 0, it holds 
that iijln = 0 when So ^ O, so that Pn^i = ^i. Also, note that ||r|p = nrjirj 2 . 
Then, by using Lemma lA.ll under Conditions 2 to 4, it holds that ■af{(n — 1)Sd — 
tr(Si)P„}'Ui/Ai ^2 = nr]irj 2 + op(l) as d ^ oo. Hence, from (3) and the assumption that 
iijzi > 0, we have that iii {{nr]ir] 2 )~^^‘^r} = 1 + op(l) as d —>■ oo for > 0, z = 1,2. In 
view of the elements of r, we can conclude the result of Corollary 1. 

A.2.2 Proofs of Theorem 2 and Corollary 2 

We write that xj — = {xj — /rj + Ylm=i ~ hm) for J = I;..., n; z = 1,..., k. We 

note that vai{e^{xj - G HJ = eJSjerf/Amin < An/Amin = o(l) as d ^ oo 

under Condition 1 for j = 1,..., n; z = 1,..., k, where (g M'^) is an arbitrary unit vector. 
Then, under Condition 1, when Xj G Hj, it holds that as d —>■ oo 

. 1/2 =- 7172 -+ ('>■ 

^min ^min 

Then, by using Lemmas IA.2I and IA.31 we can conclude the result of Theorem 2. 

For the proof of Corollary 2, from Lemma IA.21 the results are obtained straightfor¬ 
wardly. 

A.2.3 Proof of Theorem 3 

By combining Lemmas IA.4I and IA.51 from Theorem 2 and the assumption that uf z* > 0 
for all z, the result is obtained straightforwardly. 

A.2.4 Proof of Proposition 1 

Let = eiSi + £ 2 ^ 2 . Then, we define the eigen-decomposition of by = 

where Ai(*) > ••• > Xd{^) > 0 are eigenvalues of and is 

a unit eigenvector corresponding to Aj(*) for each z. Let A = £i£ 2 Ai^ 2 - Then, from 
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X) = A/i;^ 2 Ai ^2 + ^(*)) under maxj=i_2/i^ 2 ^*Ai, 2 /^i ,2 —)• 0 as d —>■ oo, it holds that 
/if 2 ^/^i 2 /A —^ 1 as d ^ 00 , so that 


E 


A 


0(1), 


( 21 ) 


/ 1 /2 

where 1 I 12 — /^i, 2/^1 2 - ~ ^ ~ l,...,d. For a sufficiently large 

d, when k( 1) > 0, there exists some positive integer such that = max{z|K(f) > 
0 for i = 1, ...,d}. Then, from ([2T]) . we have that 1 ^ 1 , 2 )“^ — o(l)) so that Aj^/A = 

1 + 0 ( 1 ) with f* = + 1. When k( 1) < 0 for a sufficiently large d, it holds that Aj^/A = 

1 + 0 ( 1 ) with i* = 1. In addition, under liminfrf_^oo |Ai'/Ai* —1| > 0 for z' = 1,..., d (z' / z*), 
it holds that hf f^i 2 = l + o(l) from 2 > 0. Then, from the fact that hJ'^'Sihi^/X —>• 0 

as d —>• 00 for z = 1, 2, in a way similar to ([8]), we have that 


xy^ 

T'* l-k 


hlilJ'i- fJ-) 

, 1/2 


+ op(l) 


when Xj G Hj for j = 1, z = 1,2. We can conclude the results. 
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